# Document QA
Donut Sroie Company Sample Demo
MIT
Donut is a Transformer-based document understanding model specifically designed for document question-answering tasks.
Question Answering System English
D
Chan-yeong
22
0
Cogvlm2 Llama3 Chat 19B Int4
Other
CogVLM2 is a multimodal dialogue model based on Meta-Llama-3-8B-Instruct, supporting both Chinese and English, with 8K context length and 1344*1344 resolution image processing capabilities.
Text-to-Image
Transformers English

C
THUDM
467
28
Layoutlmv2 Base Uncased Finetuned Docvqa
This model is a document visual question answering (VQA) specialized model based on Microsoft's LayoutLMv2 architecture, fine-tuned for document understanding tasks
Text-to-Image
Transformers

L
rogdevil
16
0
Layout Qa Hparam Tuning
A document QA model fine-tuned based on microsoft/layoutlmv2-base-uncased, suitable for document layout understanding and QA tasks
Question Answering System
Transformers

L
PrimWong
14
0
Lilt Document QA
MIT
LiLT is a pre-trained model for Document Visual Question Answering (DocVQA) tasks, specifically designed for handling question-answering tasks in English documents.
Image-to-Text
Transformers English

L
TusharGoel
80
3
Layoutlm Document Qa
MIT
This is a fine-tuned multimodal LayoutLM model for document question answering tasks, capable of understanding both text and layout information in documents to answer questions.
Text-to-Image
Transformers English

L
impira
26.10k
1,102
Bert Base Uncased Finetuned Docvqa
Apache-2.0
BERT-based model fine-tuned on Document Visual Question Answering (DocVQA) tasks
Question Answering System
Transformers

B
tiennvcs
60
1
Layoutlmv2 Base Uncased Finetuned Docvqa
A document visual question answering model based on the LayoutLMv2 architecture, fine-tuned for document understanding tasks
Text-to-Image
Transformers

L
tiennvcs
983
14
Featured Recommended AI Models